Voice-operated services

ABSTRACT

A method and apparatus accesses a database where entries are linked to at least two sets of patterns. One or more patterns of a first set of patterns are recognized within a received signal. The recognized patterns are used to identify entries and compile a list of patterns in a second set of patterns to which those entries are also linked. The list is then used to recognize a second received signal. The received signals may, for example, be voice signals or signals indicating the origin or destination of the received signals.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention is concerned with automated voice-interactiveservices employing speech recognition, particularly, though notexclusively, for use over a telephone network.

2. Related Art

A typical application is an enquiry service where a user is asked anumber of questions in order to elicit replies which, after recognitionby a speech recogniser, permit access to one or more desired entries inan information bank. An example of this is a directory enquiry system inwhich a user, requiring the telephone number of a telephone subscriber,is asked to give the town name and road name of the subscriber'saddress, and the subscriber's surname.

SUMMARY OF THE INVENTION

According to one aspect of the present invention there is provided aspeech recognition apparatus comprising a store of data containingentries to be identified and information defining for each entry aconnection with a word of a first set of words and a connection with aword of a second set of words; speech recognition means; and controlmeans operable:

(a) so to control the speech recognition means as to identify byreference to recognition information for the first set of words as manywords of the first set as meet a predetermined criterion of similarityto first received voice signals;

(b) upon such identification, to compile a list of all words of thesecond set which are defined as connected with entries defined asconnected also with the identified word(s) of the first set; and

(c) so to control the speech recognition means as to identify byreference to recognition information for the second set of words one ormore words of the list which resemble(s) second received voice signals.

Preferably the speech recognition means is operable upon receipt of thefirst voice signal to generate for each identified word a measure ofsimilarity with the first voice signal, and the control means isoperable to generate for each word of the list a measure obtained fromthe measure(s) for the relevant word(s) of the first set (i.e thoseidentified words of the first set with which a word of the list has acommon entry). The speech recognition means is then operable uponreceipt of the second voice signal to perform the identification of oneor more words of the list in accordance with a recognition processweighted in dependence on the measures generated for the words of thelist.

The apparatus may also include a store containing recognition data forall words of the second set and the control means is operable followingthe compilation of the list and before recognition of the word(s) of thelist to mark in the recognition data store those items of data thereinwhich correspond to the words not in the list or those which correspondto words which are in the list, whereby the recognition means may ignoreall words so marked or, respectively, not marked.

Alternatively the recognition data may be generated dynamically eitherbefore recognition or during recognition, the control means beingoperable following the compilation of the list to generate recognitiondata for each word of the list. Methods for dynamically generatingrecognition data fall outside the scope of the present invention butwill be clear to those skilled in this art.

Preferably the control means is operable to select for output that entryor entries defined as connected both with an identified word(s) of thefirst set and an identified word of the second set.

The store of data may also contain information defining for each entry aconnection with a word of a third set of words, the control means beingoperable:

(d) to compile a list of all words of the third set which are defined asconnected with entries each of which is also defined as connected bothwith an identified word of the first set and an identified word of thesecond set; and

(e) so to control the speech recognition means as to identify byreference to stored recognition information for the third set of wordsone or more words of the list which resemble(s) third received voicesignals.

Furthermore, means may be included to store at least one of the receivedvoice signals, the apparatus being arranged to perform an additionalrecognition process in which the control means is operable:

(a) so to control the speech recognition means as to identify byreference to stored recognition information for the second set of wordsa plurality of words of the second set which meet a predeterminedcriterion of similarity to the second received voice signals;

(b) to compile an additional list of all words of the first set whichare defined as connected with entries defined as connected also with theidentified words of the second set; and

(c) so to control the speech recognition means as to identify byreference to stored recognition information for the first set of wordsone or more words of the said additional list which resemble(s) thefirst received voice signals.

Preferably the apparatus includes means to recognise a failure conditionand to initiate the said additional recognition process only in theevent of such failure being recognised.

The apparatus may comprise a telephone line connection; a speechrecogniser for recognising spoken words received via the telephone lineconnection, by reference to recognition data representing a set ofpossible utterances; and means responsive to receipt via the telephoneline connection of signals indicating the origin or destination of atelephone call to access stored information identifying a subset of theset of utterances and to restrict the recogniser operation to thatsubset.

According to a further aspect of the invention, a telephone apparatuscomprises a telephone line connection; a speech recogniser fordetermining or verifying the identity of the speaker of spoken wordsreceived via the telephone line connection, by reference to recognitiondata corresponding to a set of possible speakers; and means responsiveto receipt via the telephone line connection of signals indicating theorigin or destination of a telephone call to access stored informationidentifying a subset of the set of speakers and to restrict therecogniser operation to that subset.

According to a yet further aspect of the invention, a telephoneinformation apparatus comprises a telephone line connection; a speechrecogniser for recognising spoken words received via the telephone lineconnection, by reference to one of a plurality of stored sets ofrecognition data; and means responsive to receipt via the telephone lineconnection of signals indicating the origin or destination of atelephone call to access stored information identifying one of the setsof recognition data and to supply this set to the recogniser.

The stored sets may, for example, correspond to different languages orregional accents or, say, two of the sets may correspond to thecharacteristics of different types of telephone apparatus, for instancethe characteristics of a mobile telephone channel.

According to a further aspect of the invention a recognition apparatuscomprises

a store defining a first set of patterns;

a store defining a second set of patterns;

a store containing entries to be identified;

a store containing information relating each entry to a pattern of thefirst set and to a pattern of the second set;

recognition means operable upon receipt of a first input pattern signalto identify as many patterns of the first set as meet a predeterminedrecognition criterion;

means to generate a list of all patterns of the second set which arerelated to an entry to which an identified pattern(s) of the first setis also related; and recognition means operable upon receipt of a secondinput pattern signal to identify one or more patterns of the list.

The patterns may represent speech and the recognition means be a speechrecogniser.

In accordance with the invention, a speech recognition apparatuscomprises

(i) a store of data containing entries to be identified and informationdefining for each entry a connection with a signal of a first set ofsignals and a connection with a word of a second set of words;

(ii) means for identifying a received signal as corresponding to as manysignals of the first set as meet a predetermined criterion;

(iii) control means operable to compile a list of all words of thesecond set which are defined as connected with entries defined asconnected also with the identified signal(s) of the first set; and

(iv) speech recognition means operable to identify by reference tostored recognition information for the second set of words one or morewords of the list which resemble(s) received voice signals.

Preferably the first set of signals are voice signals representingspelled versions of the words of the second set or initial portionsthereof and the identifying means are formed by the speech recognitionmeans operating by reference to stored recognition information for thesaid spelled voice signals. Alternatively the first set of signals maybe signals consisting of tones and the identifying means is a tonerecogniser. The first set of signals may indicate the origin ordestination of the receive signal.

In accordance with a further aspect of the invention, a method ofidentifying entries in a store of data by reference to storedinformation defining connections between entries and words, comprises

(a) identifying one or more of the said words as present in receivedvoice signals;

(b) compiling a list of those of the said words defined as connectedwith entries defined as connected also with the identified word(s);

(c) identifying one or more of the words of the list as present in thereceived voice signals.

In a further aspect of the invention a speech recognition apparatuscomprises

a) a store of data containing entries to be identified and informationdefining for each entry a connection with at least two words;

b) a speech recognition means able to identify by reference to storedrecognition information for a defined set of words at least one word orword sequence which meets some predefined criterion of similarity to areceived voice signal;

(c) a control means operable:

i) to compile a list of words which are defined as connected withentries defined as connected with a word previously identified by thespeech recognition means; and

ii) so to control the speech recognition means as to identify byreference to stored recognition information for the compiled list one ormore words or word sequences which resemble a further received voicesignal.

A method of speech recognition by reference to a stored set of words tobe recognised, according to the invention comprises

(a) receiving a speech signal;

(b) storing the speech signal;

(c) receiving a second signal;

(d) compiling a list of words, being a subset of the set of words, as afunction of the second signal;

(e) applying to the stored speech signal a speech recognition process soas to identify by reference to the list one or more words of the subset.

The second signal may also be a speech signal, and the second signal maybe recognised by reference to recognition data representing the lettersof the alphabet, either individually or as sequences. Alternatively thesecond signal may be a signal consisting of tones generated by a keypad.

According to another aspect of the invention, a method of speechrecognition comprises

(a) receiving a speech signal;

(b) storing the speech signal;

(c) performing a recognition operation on the speech signal or someother signal;

(d) in the event of the recognition operation failing to meet apredetermined criterion of reliability, retrieving the stored speechsignal and performing a recognition operation thereon.

BRIEF DESCRIPTION OF THE DRAWINGS

Some embodiments of the invention will now be described, by way ofexample, with reference to the accompanying drawings, in which:

FIG. 1 shows schematically the architecture of a directory enquirysystem;

FIG. 2 is a flow chart illustrating the operation of the directoryenquiry system of FIG. 1;

FIG. 2a is a flow chart illustrating a second embodiment of operation ofthe directory enquiry system of FIG. 1;

FIG. 3 is a flow chart illustrating the use of CLI in the operation ofthe directory enquiry system of FIG. 1;

FIG. 3a includes a further information gathering step for use in theoperation of the directory enquiry system of FIG. 1;

FIG. 4 is a flow chart illustrating a further mode of operation of thedirectory enquiry system of FIG. 1.

DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS

The embodiment of the invention now to be described addresses the samedirectory enquiry task as was discussed in the introduction. It operatesby firstly asking an enquirer for a town name and, using a speechrecogniser, identifies as "possible candidates" two or more possibletown names. It then asks the enquirer for a road name and recognition ofthe reply to this question then proceeds by reference to stored datapertaining to all road names which exist in any of the candidate towns.Similarly, the surname is asked for, and a recognition stage thenemploys recognition data for all candidate road names in candidatetowns. The number of candidates retained at each stage can be fixed, or(preferably) all candidates meeting a defined acceptance criterion--e.g.having a recognition score above a defined threshold--may be retained.

Before describing the process in more detail, the architecture of adirectory enquiry system will be described with reference to FIG. 1. Aspeech synthesiser 1 is provided for providing announcements to a uservia a telephone line interface 2, by reference to stored, fixed messagesin a message data store 3, or from variable information supplied to itby a main control unit 4. Incoming speech signals from the telephoneline interface 2 are conducted to a speech recogniser 5 which is able torecognise spoken words by reference to, respectively, town name, roadname or surname recognition data in recognition data stores of 6, 7, 8.

A main directory database 9 contains, for each telephone subscriber inthe area covered by the directory enquiry service, an entry containingthe name, address and telephone number of that subscriber, in text form.The town name recognition data store 6 contains, in text form, the namesof all the towns included in the directory database 9, along with storeddata to enable the speech recogniser 5 to recognise those town names inthe speech signal received from the telephone line interface 2. Inprinciple, any type of speech recogniser may be used, but for thepurposes of the present description it is assumed that the recogniser 5operates by recognising distinct phonemes in the input speech, which aredecoded by reference to stored data in the store 6 representing adecoding tree structure constructed in advance from phonetictranslations of the town names stored in the store 6, decoded by meansof a Viterbi algorithm. The stores 7, 8 for road name recognition dataand surname recognition data are organised in the same manner. Although,for example, the surname recognition data store 8 contains data for allthe surnames included in the directory database 9, it is configurable bythe control unit 4 to limit the recognition process to only a subset ofthe names, typically by flagging the relevant parts of the recognitiondata so that the "recognition tree" is restricted to recognising onlythose names within a desired subset of the names.

This enables the `recognition tree` to be built before the callcommences and then manipulated during the call. By restricting theactive subset of the tree, computational resources can be concentratedon those words which are most likely to be spoken. This reduces thechances that an error will occur in the recognition process, in thosecases where one of these most likely words has been spoken.

Each entry in the town data store 6 contains, as mentioned above, textcorresponding to each of the town names appearing in the database 9, toact as a label to link the entry in the store 6 to entries in thedatabase 9 (though other kinds of label may be used if preferred). Ifdesired, the store 6 may contain an entry for every town name that theuser might use to refer to geographical locations covered by thedatabase, whether or not all these names are actually present in thedatabase. Noting that some town names are not unique (there are fourtowns in the UK called Southend), and that some town names carry thesame significance (e.g. Hammersmith, which is a district of London,means the same as London as far as entries in that district areconcerned), an equivalence data store 39 is also provided, containingsuch equivalents, which can be consulted following each recognition of atown name, to return additional possibilities to the set of town namesconsidered to be recognised. For example if "Hammersmith" is recognised,London is added to the set; if "Southend" is recognised, thenSouthend-on-Sea, Southend (Campbeltown), Southend (Swansea) and Southend(Reading) are added.

The equivalence data store 39 could, if desired, contain similarinformation for roads and surnames, or first names if these are used;for example Dave and David are considered to represent the same name.

As an alternative to this structure, the vocabulary equivalence datastore 39 may act as a translation between labels used in the name stores6, 7, 8 and the labels used in the database (whether or not the labelsare names in text form).

The use of text to define the basic vocabulary of the speech recogniserrequires that the recogniser can relate one or more textual labels to agiven pronunciation. That is to say in the case of a `recognition tree`,each leaf in the tree may have one or more textual labels attached toit. If the restriction of the desired vocabulary of a recogniser is alsodefined as a textual list, then the recogniser should preferably returnonly textual labels in that list, not labels associated with apronunciation associated with a label in the list that are notthemselves in the list.

The system operation is illustrated by means of the flowchart set out inFIG. 2. The process starts (10) upon receipt of an incoming telephonecall signalled to the control unit 4 by the telephone line interface 2;the control unit responds by instructing the speech synthesiser 1 toplay (11) a message stored in the message store 3 requesting the callerto give the name of the required town. The caller's response is received(12) by the recogniser. The recogniser 3 then performs its recognitionprocess (13) with reference to the data stored in the store 6 andcommunicates to the control unit 4 the name of the town which mostclearly resembles the received reply or (more preferably) the names ofail those towns which meet a prescribed threshold of similarity with thereceived reply. We suppose (for the sake of this example) that four townnames meet this criterion. The control unit 4 responds by instructingthe speech synthesiser to play (14) a further message from the messagedata store 3 and meanwhile accesses (15) the directory database 9 tocompile a list of all road names which are to be found in any of thegeographical locations corresponding to those four town names and alsoany additional location entries obtained by accessing the equivalencedata store 39. It then uses (16) this information to update the roadname recognition data store 7 so that the recogniser 3 is able torecognise only the road names in that list.

The next stage is that a further response, relating to the road name, isreceived (17) from the caller and is processed by the recogniser 3utilising the data store 7; suppose that five road names meet therecognition criterion. The control unit 4 then instructs the playing(19) of a further message asking for the name of the desired telephonesubscriber and meanwhile (20) retrieves from the database 9 a list ofthe surnames of all subscribers residing in roads having any of the fiveroad names in any of the four geographical locations (and anyequivalents), and updating the surname recognition data store 8 in asimilar manner as described above for the road name recognition datastore. Once the user's response is received (22) by the recogniser, thesurname may be recognised (23) by reference to the data in the surnamerecognition data store.

It may of course be that more than one surname meets the recognitioncriterion; in any event, the database 9 may contain more than one entryfor the same name in the same road in the same town. Therefore at step24 the number of directory entries which have one of the recognisedsurnames and one of the recognised road names and one of the recognisedtown names is tested. If the number is manageable, for example if it isthree or fewer, the control means instructs (25) the speech synthesiserto play an announcement from the message data store 3, followed byrecitation of the name, address and telephone number of each entry,generated by the speech synthesiser 1 using text-to-speech synthesis,and the process is complete (26). If, on the other hand, the number ofentries is excessive then further steps 27, to be discussed furtherbelow, will be necessary in order to meet the caller's enquiry.

It will be seen that the process described will have a lower failurerate than a system which chooses only a single candidate town, road orsurname at each stage of the recognition process, since by retainingsecond and further choice candidates the possibility of error due tomis-recognition is reduced though there is increased risk of recognitionerror due to the larger vocabulary. A penalty for this increasedreliability is of course increased computation time, but by ensuringthat the road name and surname recognition processes are conducted overonly a limited number of the total number of road names and surnames inthe database, the computation can be kept to manageable proportions.

Moreover, compared with a system in which a second-stage recognition isunconstrained by the results of a previous recognition (e.g. one wherethe `road` recognition processes is not limited to roads in townproposed system would, when using recognisers (such as those usingHidden Markov Models) which internally "prune" intermediate results, beless liable to prune out the desired candidate in favour of othercandidate roads from unwanted towns.

It will be seen too, that the number of possible lists will, in mostapplications, be so large as to prohibit their preparation in advance,and hence the construction of the list is performed as required. Wherethe recogniser is of the type (e.g. recognisers using Hidden Markovmodels) which require setting up for a particular vocabulary, there aretwo options for updating the relevant store to limit the recogniser'soperation to words in the list. One is to start with a fully set-uprecogniser, and disable all the words not in the list; the other is toclear the relevant recognition data store and set it up afresh (eithercompletely, or by adding words to a permanent basic set). It should benoted that some recognisers do not store recognition data for ail wordswhich may be recognised. These recognisers generally have a store oftextual information relating to the words that may be recognised but donot prestore data to enable the speech recogniser to recognise words ina received signal. In such so-called "dynamic recognisers" therecognition data is generated either immediately before or duringrecognition.

The first option requires large data stores but is relativelyinexpensive computationally for any list size. The second option isgenerally computationally expensive for large lists but requires muchsmaller data stores and is useful when there are frequent data changes.Generally the first option would be preferred, with the second optionbeing invoked in the case of a short list, or where the data changefrequently.

The criterion for limiting the number of recognition `hits` at steps 13,18 or 23 may be that all candidates are retained which meet somesimilarity criterion, though other criteria such as retaining always afixed number of candidates may be chosen if preferred. It may be, in theearlier recognition stages, that the computational load and effect onrecognition performances of retaining a large town (say) with a lowscore is not considered to be justified, whereas retaining a smallertown with the same score might be. In this case the scores of arecognised word may be weighted by factors dependent on the number ofentries referencing that word, in order to achieve such differentialselection.

In the examples discussed above, a list of words (such as road names) tobe recognised is generated based on the results of an earlierrecognition of a word (the town name). However it is not necessary thatthe unit in the earlier recognition step or in the list be single words;they could equally well be sequences of words. One possibility is asequence of the names of the letters of the alphabet, for example a listof words for a town name recognition step may be prepared from anearlier recognition of the answer to the question "please spell thefirst four letters of the town name." If recording facilities areprovided (as discussed further below) it is not essential that the orderof recognition be the same as the order of receipt of the replies (itbeing more natural to ask for the spoken word first, followed by thespelled version, though it is preferred to process them in the oppositesequence).

It is assumed in the above description that the recognisers alwaysproduce a result--i.e. that the town (etc) name or names which give thenearest match(es) to the received response are deemed to have beenrecognised. It would of course be possible to permit output of a "fail"message in the event that a reasonably accurate match was not found. Inthis case further action may be desired. This could simply be switchingthe call to a manual operator. Alternatively further information may beprocessed automatically as shown in FIG. 2a. In this example a lowconfidence match 40 has still resulted in four possible candidate towns.Because of the questionable accuracy of this match a further message isplayed to the caller asking for an additional reply which may be checkedagainst existing recognition results. In the example, a spelling of thetown name is requested 41 allowing all permissible spellings of all townnames in the recognition vocabulary. Following a confident recognition43 two spellings are recognised. These two town names may be consideredmore confident than the four spoken town names recognised previously,but a comparison 44 of both lists may reveal one or more common townnames in both lists. If this is so 46 then a very high confidence ofsuccess may be inferred for these common town names and the enquiry mayproceed, for example, in the same manner as FIG. 2 using-these commontowns to prepare the road name recognition 15. If no common town namesare found then the two spelt towns may be retained 47 for use in thenext stage which may be preparing the road name recogniser 15 with thetwo town names as shown in the diagram, or may be a different processingstep not shown in FIG. 2a, for example a confirmation of the moreconfident of the two town names with the user in order to increase thesystem confidence before a subsequent request for information is made.

It is not necessary that the response to be recognised be discreteresponses to discrete questions. They could be words extracted by arecogniser from a continuous sentence, for systems which work in thisway.

Another situation in which it may be desired to vary the scope of thespeech recogniser's search is where it can be modified on the basis notof previous recogniser results but of some external information relevantto the enquiry. In a directory enquiry system this may be a signalindicating the origin of a telephone call, such as the calling lineidentity (CLI) or a signal identifying the originating exchange. In asimple implementation this may be used to restrict town name recognitionto those town names located in the same or an adjacent exchange area tothat of the caller. In a more sophisticated system this identificationof the calling line or exchange may be used to access stored informationcompiled to indicate the enquiry patterns of the subscriber in questionor of subscribers in that area (as the case may be).

For example, a sample of directory enquiries in a particular area mightshow that 40% of such calls were for numbers in the same exchange areaand 20% for immediately adjacent areas. Separate statistical patternsmight be compiled for business or residential lines, or for differenttimes of day, or other observed trends such as global usage statisticsof a service that are not related to the nature or location of theoriginating line.

The effect of this approach can be to improve the system reliability forcommon enquiries at the expense of uncommon ones. Such a system thusaims to automate the most common or straightforward enquiries, withother calls being dealt with in an alternative manner, for example beingrouted to a human operator.

As an example, FIG. 1 additionally shows a CLI detector 20, (used hereonly to indicate the originating exchange) which is used to select froma store 21 a list of likely towns for enquiries from that exchange, tobe used by the control unit 4 to truncate the "town name" recognition,as indicated in the flowchart of FIG. 3, where the calling lineindicator signal is detected at step 10a, and selects (12a) a list oftown names from the store 21 which is then used (12b) to update the townname recognition store 6 prior to the town name recognition step 13. Theremainder of the process is not shown as it is the same as that given inFIG. 2.

An extension of this approach is to improve the system reliability andspeed for common enquiries, whilst using additional information toenable the less common enquiries to succeed. Thus the less commonenquiries are still able to succeed but require more effort andinformation to be supplied by the caller than the common enquiriesrequire.

As an example consider FIG. 3a. The spoken town name is asked for 11,and the CLI is detected 10a. As in FIG. 3, the CLI is then related totown names commonly requested by callers with that CLI identity 12a.These town names update the spoken town name store 12b. This process isidentical to that shown in FIG. 3 so far. Additionally, as the speech isgathered for recognition it is stored for later re-recognition 37. Therestricted town name set used in the recognition 13 will typically be asmall vocabulary covering a significant proportion of enquiries. If aword within this vocabulary is spoken and confidently recognised 48 thenthe enquiry may immediately use this recognised town or towns to preparethe road name store and continue as described in FIG. 2.

If the word is recognised as being outside of the vocabulary or of poorconfidence then an additional message 49 is played to ask the caller formore information, which in this case is the first four letters of thetown name. Simultaneously, an additional re-recognition of the spokentown name 53 may be performed which can recognise any of the possibletown names in the directory. In this example we assume that four townnames are recognised 54. At the same time, the caller may be spelling inthe first four letters of the town name 50 and two spellings 51 havebeen confidently recognised. These two spellings are then expanded tothe full town names which match them 52. It may be necessary toanticipate common spelling errors, additional or missing letters,abbreviations, and punctuation in the preparation of the spellingvocabulary, and the subsequent matching of the spelt recognition resultsto the full town names. Assume in this example that five town namesmatch the two spellings.

A comparison 55 identical in purpose to that described in FIG. 2a (44)may then be performed between the five town names derived from the twospellings and the four re-recognised town names. If common words arefound in these two sets, (only one common word is assumed in thisexample,) then this town name may confidently be assumed to be thecorrect one and the road name recognition data store 7 may be preparedfrom it and the enquiry proceeds as shown in FIG. 2.

In other cases, the spoken recognition 53 will be in error and no commonwords will be found. Alternatively, the recognition of the town name 53,and its subsequent comparison 55, may be considered optional andomitted. In both of these instances the spoken town store will beupdated 57 with the five towns derived from the two spellings 52 and thespoken town name re-recognised again 58. In the example, it is assumedthat a single confident town name was recognised. This town name may beused to configure the road name recognition data store 7 and the enquiryproceeds as shown in FIG. 2.

The deliberate restriction of a vocabulary to only the very most likelywords as described above need not necessarily depend on CLI. Thepreparation of the road name vocabulary based on the recognised townnames is itself an example of this, and the approach of asking foradditional information, as shown in FIG. 3a, may be used if any suchrestricted recognition results are not confident. Global observed orpostulated behaviour can also be used to restrict a vocabulary (e.g. thetown store) in a similar way to CLI derived information, as can signalsindicating the destination of a call. For example, callers may beencouraged to dial different access numbers for particular information.On receipt of a call by a common apparatus for all the information, thedialed number determines the subset of the vocabulary to be used insubsequent operation of the apparatus. The operation of the apparatuswould then continue similarly as described above with relation to CLI.

Additionally, the re-recognition of a gathered word that has beenconstrained by additional information such as the four letter spellingin FIG. 3a could be based on any kind of information, for example DTMFentry via the telephone keypad, or a yes,no response to a questionrestricting the scope of the search (e.g. "Please say yes or no: doesthe person live in a city?"). This additional information could even bederived from the CLI using a different area store 21 based on differentassumptions to the previously used one.

In the above described embodiment, no account is taken of the relativeprobability of recognition, for example if the town recognition step 13recognises town names Norwich and Harwich, then when, at roadrecognition step 18, the recogniser has to evaluate the possibility thatthe caller said "Wright Street" (which we suppose to be in Norwich) or"Rye Street" (in Harwich), no account is taken of the fact that thespoken town bore a closer resemblance to "Norwich" than it did to"Harwich". If desired however, the recogniser may be arranged to produce(in known manner) figures or "scores" indicating the relative similarityof each of the candidates identified by the recogniser to the originalutterance and hence the supposed probability of it being the correctone. These scores may then be retained whilst a search is made in thedirectory database to derive a list of the vocabulary items of the nextdesired vocabulary that are related to the recognised words. These newvocabulary items may then be given the scores that the correspondingmatching word attained. In the case where a word came from a match withmore than one recognised word of the previous vocabulary, the maximumscore of the two may be selected for example. These scores may then befed as a priori probabilities to the next recognition stage to bias theselection. This may be implemented in the process depicted in FIG. 2 asfollows.

Step 13. The recogniser produces for each town, a score--e.g.

Harwich 40%

Norwich 25%

Nantwich 20%

Northwich 15%

Step 15. When the road list is compiled the appropriate score isappended to the road name, e.g.

Wright Street 25%

Rye Street 40%

North Street (assumed to exist in both Norwich and Nantwich) 25% andstored in the store 7.

Step 18. When the recogniser comes to recognise the road name, it maypre-weight the recognition network (for example in the case of HiddenMarkov Models) with the scores from store 7. It then recognises thesupplied word, with the resulting effect that these weights make themore likely words less likely to be prematurely pruned out.Alternatively, the recogniser may recognise the utterance, and adjustits resulting scores after recognition according to the contents ofstore 7. This second option provides no benefit to the pattern matchingprocess, but both options propagate the relative likelihood of an entryfinally being selected from vocabulary to vocabulary. For example,considering the post-weighted option, if the recogniser would haveassigned the scores of 60%, 30% and 10% to Wright Street, Rye Street andNorth Street respectively then the weighted scores would be:

Wright Street (Norwich) 25%×60%=15%

Rye Street (Harwich) 40%×30%=12%

North Street (Norwich and Nantwich) 25%×10%=2.5%

Similar modification would of course occur for the steps 20, 21, 23.This is just one example of a scheme for score propagation.

The possibility of switching to a manual operator in the event of a"failure" condition has already been mentioned. Alternatively a usercould simply be asked to repeat the action that has not been recognised.However, further automated steps may be taken under failure conditions.

A failure condition can be identified by noting low recogniser output"scores", or of excessive numbers of recognised words all having similarscores (whether by reference to local scores or to weighted scores) orby comparing the scores with those produced by a recogniser comparingthe speech to out-of-vocabulary models. Such a failure condition mayarise in an unconstrained search like that of the town name recognitionof step 13 in FIG. 2. In this case it may be that better results mightbe obtained by performing (for example) the road name recognition stepfirst (unconstrained) and compiling a list of all town names containingthe roads found, to constrain a subsequent town name recognition step.Or it may arise in a constrained search such as that of step 13 in FIG.3 or steps 18 and 23 in FIG. 2, where perhaps the constraint has removedthe correct candidate from the recognition set; in this case removingthe constraint--or applying a different one--may improve matters.

Thus one possible approach is to make provision for recording thecaller's responses, and in the event of failure, reprocessing them usingthe steps set out in FIG. 2 (except the "play message" steps 11, 14, 19)but with the original sequence town name/road name/surname modified.There are of course six permutations of these. One could choose that one(or more) of these which experience shows to be the most likely toproduce an improvement. The result of such a reprocessing could be usedalone, or could be combined with the previous result, choosing foroutput those entries identified by both processes.

Another possibility is to perform an additional search omitting onestage, and comparing the results as for the `spelled input` case.

If desired, processing using two (or more) such sequences could beperformed routinely (rather than only under failure conditions); toreduce delays an additional sequence might commence before completion ofthe first; for example (in FIG. 4) an additional, unconstrained "roadname" search 30 could be performed (without recording the road name)during the "which surname" announcement. From this, a list of surnamesis compiled (31) and the surname store updated (32). Once the surnamesfrom the list have been recognised (33) a town name list may be compiled(34) and the town name store updated (35). Then at step 36 the spokentown name, previously stored at step 37 may be recognised. The resultsof the two recognition processes may then be compiled, suitably byselecting (38) those entries which are identified by both processes.Alternatively, if no common entries are found, the entries found by oneor the other or both of the processes may be used. The remaining stepsshown in FIG. 4 are identical to those in FIG. 2.

The technique of storing an utterance and using it in arestricted-vocabulary recognition process following recognition of alater utterance has been described as an option to be used alongsidesequential processing, as a cross-check or to provide additionalrecognition results to be used in the case of difficulty. However, itmay be used alone, for example in circumstances where one chooses tohave the questions asked in a sequence which seem natural to the user,so as to improve speed and reliability of response, but to process theanswers in a sequence which is more suited to the nature of the data.For example in FIG. 4, the right hand branch only could be used (butwith steps 14, 17, 19 and 22 retained to feed it)--i.e. omit steps 15,16, 18, 20, 21, 23, 38.

The use of CLI to modify the expectations of a speech service need notbe restricted to the modification of expected vocabulary items asalready described. Enquiry systems that require a certain level ofsecurity or personal identification may also use CLI to their advantage.The origin of the telephone call as given by the CLI may be used toextract from a store the identity of a number of individuals known tothe system to be related to this origin. This store may also containrepresentative speech which is already verified to have come from theseindividuals. If there is only one individual authorised to access thegiven service from the designated origin, or the caller has made aspecific claim to identity by means of additional information (e.g. aDTMF or spoken personal identification number) then a spoken utterancemay be gathered from the caller and compared with the stored speechpatterns associated with that claimed identity in order to verify thatthe person is who they say that they are. Alternatively, if there are anumber of individuals associated with the call origin, the identity ofthe caller may be determined by gathering a spoken utterance from thecaller and comparing it with stored speech patterns for each of theindividuals in turn, selecting the most likely candidate that matcheswith a certain degree of confidence.

The CLI may also be used to access a store relating speech recognitionmodels to the origin of the call. These speech models may then be loadedinto the stores used by the speech recogniser. Thus, a call originatingfrom a cellular telephone, for example, may be dealt with using speechrecognition models trained using cellular speech data. A similar benefitmay be derived for regional accents or different languages in a speechrecognition system.

What is claimed is:
 1. A speech recognition apparatus comprising:a storeof data containing entries to be identified and information defining foreach entry a connection with a word of a first set of words and aconnection with a word of a second set of words: speech recognitionmeans; and control means operable:(a) to control the speech recognitionmeans to identify, by reference to recognition information for the firstset of words, as many words of the first set as meet a predeterminedcriterion of similarity to first received voice signals; (b) upon suchidentification, to compile a list of all words of the second set whichare connected with entries connected also with the identified word(s) ofthe first set; and (c) to control the speech recognition means as toidentify, by reference to recognition information for the second set ofwords, at least one word of the list which resembles second receivedvoice signals.
 2. A speech recognition apparatus as in claim 1, inwhich:the speech recognition means is operable upon receipt of the firstvoice signal to generate for each identified word a measure ofsimilarity with the first voice signal, and the control means isoperable to generate for each word of the list a measure obtained fromthe measures for the relevant words of the first set, and the speechrecognition means is operable upon receipt of the second voice signal toperform the identification of one or more words of the list inaccordance with a recognition process weighted in dependence on themeasures generated for the words of the list.
 3. A speech recognitionapparatus as in claim 2 in which:the control means is operable to weightthe measure for each word of the list by a factor dependent on thenumber of words of the second set which are connected with entriesconnected also with the relevant identified word of the first set.
 4. Aspeech recognition apparatus as in claim 2 in which:the control means isoperable to omit from the list those words of the second set having ameasure below a predetermined threshold.
 5. A speech recognitionapparatus as in claim 1 in which:the apparatus includes a storecontaining recognition data for all words of the second set, and thecontrol means is operable following the compilation of the list andbefore recognition of the words, of the list, to mark in the recognitiondata store those items of data therein which correspond to the words notin the list or those which correspond to words which are in the list,whereby the recognition means may ignore all words so marked or,respectively, not marked.
 6. A speech recognition apparatus as in claim1 in which:the control means is operable following the compilation ofthe list to generate recognition data for each word of the list.
 7. Aspeech recognition apparatus as in claim 1 in which:the control means isoperable to select for output entries defined as connected both with anidentified word of the first set and an identified word of the secondset.
 8. A speech recognition apparatus as in claim 1 in which:the storeof data also contains information defining for each entry a connectionwith a word of a third set of words, and the control means isoperable:(d) to compile a list of all words of the third set which areconnected with entries also connected both with an identified word ofthe first set and an identified word of the second set; and (e) tocontrol the speech recognition means to identify, by reference torecognition information for the third set of words, at least one word ofthe list which resembles third received voice signals.
 9. A speechrecognition apparatus as in claim 1 including:means to store at leastone of the received voice signals, the apparatus being arranged toperform an additional recognition process in which the control means isoperable:(a) to control the speech recognition means to identify, byreference to recognition information for one set of words, a pluralityof words of that set which meet a predetermined criterion of similarityto the respective received voice signals; (b) to compile an additionallist of all words of another set which are connected with entriesconnected also with the identified words of the one set; and (c) tocontrol the speech recognition means to identify, by reference torecognition information for the other set of words, at least one word ofthe said additional list which resembles the respective received voicesignals.
 10. A speech recognition apparatus as in claim 9including:means to recognise a failure condition and to initiate thesaid additional recognition process only in the event of such failurebeing recognised.
 11. A speech recognition apparatus as in claim 1further comprising:a telephone line connection; and means responsive toreceipt via the telephone line connection of signals indicating theorigin or destination of a telephone call to access stored informationidentifying a subset of at least one of the said sets of words and torestrict to that subset the operation of the speech recognition meansfor that set.
 12. A telephone information apparatus comprising:atelephone line connection; a speech recogniser for recognising spokenwords received via the telephone line connection, by reference torecognition data representing a set of possible utterances; and meansresponsive to receipt via the telephone line connection of signalsindicating the origin or destination of a telephone call to accessstored information identifying a subset of the set of utterances and torestrict the recogniser operation to that subset.
 13. Apparatus as inclaim 12, in which the apparatus includes:a store containing recognitiondata for all words of the sets, and the control means is operable tomark in the recognition data store those items of data therein whichcorrespond to the words not in the subset or those which correspond towords which are in the subset, whereby the recognition means may ignoreall words so marked or, respectively, not marked.
 14. Apparatus as inclaim 12, in which: the control means is operable to generaterecognition data for each word of the subset.
 15. A telephone apparatuscomprising:a telephone line connection; a speech recogniser fordetermining or verifying the identity of the speaker of spoken wordsreceived via the telephone line connection, by reference to recognitiondata corresponding to a set of possible speakers; and means responsiveto receipt via the telephone line connection of signals indicating theorigin or destination of a telephone call to access stored informationidentifying a subset of the set of speakers and to restrict therecogniser operation to that subset.
 16. A telephone informationapparatus comprising:telephone line connection; a speech recogniser forrecognising spoken words received via the telephone line connection, byreference to one of a plurality of stored sets of recognition data; andmeans responsive to receipt via the telephone line connection of signalsindicating the original or destination of a telephone call to accessstored information identifying one of the sets of recognition data andto supply this set to the recogniser.
 17. A telephone informationapparatus as in claim 16 in which the stored sets correspond todifferent languages or regional accents.
 18. A telephone informationapparatus as in claim 16 in which at least two of the sets correspond tothe characteristics of different types of telephone apparatus.
 19. Atelephone information apparatus as in claim 18 in which one of the setscorresponds to the characteristics of a mobile telephone channel.
 20. Aspeech recognition apparatus comprising:a store defining a first set ofwords; a store defining a second set of words; a store containingentries to be identified; a store containing information relating eachentry to a word of the first set and to a word of the second set; speechrecognition means operable upon receipt of a first voice signal toidentify as many words of the first set as meet a predeterminedrecognition criterion; means to generate a list of all words of thesecond set which are related to an entry to which the identified word(s)of the first set is also related; and speech recognition means operableupon receipt of a second voice signal to identify at least one word ofthe list.
 21. A recognition apparatus comprising:a store defining afirst set of patterns; a store defining a second set of patterns; astore containing entries to be identified; a store containinginformation relating each entry to a pattern of the first set and to apattern of the second set; recognition means operable upon receipt of afirst input pattern signal to identify as many patterns of the first setas meet a predetermined recognition criterion; means to generate a listof all patterns of the second set which are related to an entry to whichan identified pattern of the first set is also related; and recognitionmeans operable upon receipt of a second input pattern signal to identifyat least one pattern of the list.
 22. A speech recognition apparatuscomprising:(i) a store of data containing entries to be identified andinformation defining for each entry a connection with a signal of afirst set of signals and a connection with a word of a second set ofwords; (ii) means for identifying a received signal as corresponding toas many of the first set as meet a predetermined criterion; (iii)control means operable to compile a list of all words of the second setwhich are connected with entries connected also with the identifiedsignal of the first set; and (iv) speech recognition means operable toidentify, by reference to recognition information for the second set ofwords, at least one word of the list which resembles received voicesignals.
 23. A speech recognition apparatus as in claim 22 in which:thefirst set of signals are voice signals representing spelled versions ofthe words of the second set or portions thereof, and the identifyingmeans includes the speech recognition means operating by reference torecognition information for the said spelled voice signals.
 24. A speechrecognition apparatus as in claim 22 in which:the first set of signalsare signals consisting of tones and the identifying means is a tonerecogniser.
 25. A speech recognition apparatus as in claim 22 inwhich:the first set of signals are signals indicating the origin ordestination of the received signal.
 26. A method of identifying entriesin a store of data by reference to stored information definingconnections between entries and words, said method comprising:(a)identifying one or more of the said words as present in received voicesignals; (c) compiling a list of those of the said words connected withentries connected also with the identified words; and (c) identifying atleast one of the words of the list as present in the received voicesignals.
 27. A speech recognition apparatus comprising:a) a store ofdata containing entries to be identified and information defining foreach entry a connection with at least two words; b) a speech recognitionmeans able to identify by reference to stored recognition informationfor a defined set of words, at least one word or word sequence whichmeets some predefined criterion of similarity to a received voicesignal; (c) a control means operable:i) to compile a list of words whichare connected with entries connected with a word previously identifiedby the speech recognition means; and ii) to control the speechrecognition means to identify, by reference to recognition informationfor the compiled lists, at least one word or word sequence whichresembles a further received voice signal.
 28. A method of speechrecognition by reference to a stored set of words to be recognised, saidmethod comprising(a) receiving a speech signal; (b) storing the speechsignal; (c) receiving a second signal; (d) compiling a list of words,being a subset of the set of words, as a function of the second signal;(e) applying to the stored speech signal a speech recognition process soas to identify, by reference to the list at least one word of thesubset.
 29. A method as in claim 28 in which the second signal is also aspeech signal.
 30. A method as in claim 29 including the step of:recognising the second signal by reference to recognition datarepresenting a letter or sequence of letters of the alphabet.
 31. Amethod as in claim 28 in which the second signal is a signal consistingof tones generated by a keypad.
 32. A method as in claim 28 in which thesecond signal indicates the origin or destination of the second signal.33. A method of speech recognition comprising:(a) receiving a speechsignal; (b) storing the speech signal; (c) performing a recognitionoperation on the speech signal or some other signal; and (d) in theevent of the recognition operation failing to meet a predeterminedcriterion of reliability, retrieving the stored speech signal andperforming a recognition operation thereon.
 34. An interactive voicerecognition and response method for identifying at least one stored database item comprising plural classes of mutually inter-related sub-items,said method comprising:(a) issuing a synthesized voice request for afirst speech input representing a first class of sub-item; (b)performing speech recognition of said first speech input to identify atleast one potentially corresponding first sub-item; (c) issuing asynthesized voice request for a second speech input representing asecond class of sub-item; (d) compiling a list of second sub-itemsmutually inter-related with said identified first sub-item(s); and (e)performing speech recognition of said second speech input with respectto said compiled list to identify at least one potentially correspondingsecond sub-item from said list.
 35. A method as in claim 34 whereinsteps c and d are at least in part concurrently performed.
 36. A methodas in claim 34 wherein the speech recognition of step b is performedwith respect to a sub-set of the first class of sub-items.
 37. A methodas in claim 36 wherein said sub-set is chosen based on an identifiedorigin or destination location of said first speech input.